Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: final refactoring for Forth generation #749

Merged
merged 35 commits into from
Nov 6, 2022

Conversation

aryan26roy
Copy link
Collaborator

@aryan26roy aryan26roy commented Oct 6, 2022

No description provided.

@aryan26roy
Copy link
Collaborator Author

This PR solves issue #723 ,

@aryan26roy aryan26roy marked this pull request as ready for review October 11, 2022 13:35
@jpivarski jpivarski enabled auto-merge (squash) October 11, 2022 13:35
Copy link
Member

@jpivarski jpivarski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We talked about this on our weekly Zoom call. It's great, and the project is done. Congratulations!

@jpivarski jpivarski disabled auto-merge October 11, 2022 13:41
@jpivarski
Copy link
Member

Oh wait—one more thing: AwkwardForth should be used by default now.

Whenever a new AsObjects is constructed, it gets a _forth = True attribute (or that can be a class object attribute). Users who know the secret code can opt-out by setting an instance's _forth = False, but that's an exceptional, unusual situation.

@jpivarski
Copy link
Member

e946dea was the last step needed for this PR, apart from getting the tests to work again. The last error in the logs (one of 6 errors total) was due to some AwkwardForth cases being declared as NotImplementedError. In general, NotImplementedError for AwkwardForth can be replaced with turning off the Forth generation and proceeding with the old code path. So you don't have to handle more Forth cases, just ensure that it gives up appropriately if it can't proceed.

That will "hide" these cases, since they're no longer revealed by NotImplementedError, which trades the good thing of getting Uproot to run for the bad thing of not knowing which code path was taken in a given case. In practice, this will mean that some data types will still have the old deserialization speed and it becomes harder to debug performance issues.

Does the "give up" flag leave any permanent indicator on the AsObjects object, so that we have something to check if a particular deserialization seems to be very slow?

By the way, in the case that I looked at, TClonesArray, it would be very hard to write Forth. TClonesArray is a messy, complex thing.

But 9a7845e was a good thing to have fixed. So I guess not all of these are things to be skipped. (I only looked at the last one in the logs.)

@aryan26roy
Copy link
Collaborator Author

Hey, so some of the tests are failing because of the difference is Awkward Array layout. I will have to figure out those and there's one that seems to be reading strings incorrectly. From what I expected, these errors are much more tame.

@jpivarski
Copy link
Member

A different layout is significant. I guess my shortcut of checking only the last failure was misrepresentative.

When you're dealing with the NotImplementedError, be sure to do all of the cases in which the AwkwardForth is deliberately not implemented. I'm going to call the TClonesArray one hopeless, as are any with memberwise serialization, but if one of them looks like it could be easy and just fills in a case we didn't have a test for before, then at least put a note in a comment that such-and-such a test reaches that part of the code and could be implemented someday.

@aryan26roy
Copy link
Collaborator Author

I should have expanded on the comment. The difference that I spotted was a spelling mistake and some layout parameters that should not have been there. The difference in layout is not in the logical structure (at least I haven't spotted any).

@jpivarski
Copy link
Member

With the latest Awkward Array, all tests pass except for the two that you knew about.

Comment on lines 1439 to 1445
uproot._util.awkward_form(self._values, file, context),
),
None,
parameters={"__array__": "sorted_map"},
),
parameters={"__array__": "sorted_map"},
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only one level of this should be "__array__": "sorted_map". I remember we were talking about this in one of our meetings, and thought about the consequences of both choices. Putting the "__array__": "sorted_map" in with the RecordForm, rather than out with the ListOffsetForm, allows for the possibility that a whole array can be one sorted map, something we'll likely want even if it's not what we have here. (Someone could take this array and do array[0] to get only one list of key-value pairs, and that single list should have map-like semantics someday.)

Suggested change
uproot._util.awkward_form(self._values, file, context),
),
None,
parameters={"__array__": "sorted_map"},
),
parameters={"__array__": "sorted_map"},
)
uproot._util.awkward_form(self._values, file, context),
),
None,
parameters={"__array__": "sorted_map"},
),
)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might affect the test (src/uproot/streamers.py line 285). In which case, the expected test result can be changed again.

@jpivarski jpivarski added the next-release Required for the next release label Oct 31, 2022
@aryan26roy aryan26roy enabled auto-merge (squash) November 6, 2022 09:44
@aryan26roy aryan26roy merged commit ad31773 into main Nov 6, 2022
@aryan26roy aryan26roy deleted the aryan-final-refactor branch November 6, 2022 09:49
@jpivarski jpivarski linked an issue Nov 10, 2022 that may be closed by this pull request
5 tasks
@jpivarski jpivarski removed the next-release Required for the next release label Feb 15, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Next (last) steps for Forth generation
2 participants